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ABSTRACT 

We study the two-point correlation function of a uniformly selected sample of 4,426 luminous optical quasars 
with redshift 2.9 < z < 5.4 selected over 4041 deg 2 from the Fifth Data Release of the Sloan Digital Sky Survey. 
We fit a power-law to the projected correlation function w p (r p ) to marginalize over redshift space distortions 
and redshift errors. For a real-space correlation function of the form £(r) = (r/ro)~ 7 , the fitted parameters in 
comoving coordinates are ro = 15.2 ±2.7 /T 1 Mpc and 7 = 2.0 ±0.3, over a scale range 4 < r p < 150 h~ l Mpc. 
Thus high-redshift quasars are appreciably more strongly clustered than their z ~ 1 .5 counterparts, which have 
a comoving clustering length ro ~ 6.5 h Mpc. Dividing our sample into two redshift bins: 2.9 < z < 3.5 and 
z > 3.5, and assuming a power-law index 7 = 2.0, we find a correlation length of ro = 16.9 ± 1.7 A -1 Mpc for 
the former, and ro = 24.3 ± 2.4 /T 1 Mpc for the latter. Strong clustering at high redshift indicates that quasars 
are found in very massive, and therefore highly biased, halos. Following Martini & Weinberg, we relate the 
clustering strength and quasar number density to the quasar lifetimes and duty cycle. Using the Sheth & 
Tormen halo mass function, the quasar lifetime is estimated to lie in the range 4 ~ 50 Myr for quasars with 
2.9 < z < 3.5; and 30 ~ 600 Myr for quasars with z > 3.5. The corresponding duty cycles are 0.004 ~ 0.05 for 
the lower redshift bin and 0.03 ~ 0.6 for the higher redshift bin. The minimum mass of halos in which these 
quasars reside is 2-3 x 10 12 h~ l M Q for quasars with 2.9 < z < 3.5 and 4-6 x 10 12 /! _1 M Q for quasars with 
z > 3.5; the effective bias factor b e s increases with redshift, e.g., b & s ~ 8 at z = 3.0 and b e s ~ 16 at z = 4.5. 
Subject headings: cosmology: observations - large-scale structure of universe - quasars: general - surveys 



1. INTRODUCTION 

Recent galaxy surveys (e.g., the 2dF Galaxy Redshift Sur- 
vey, Colless et al. 2001 and the Sloan Digital Sky Survey 
(SDSS), York et al. 2000) have provided ample data for the 
study of the large-scale distribution of galaxies in the present- 
day Universe. The clustering of galaxies, which are tracers of 
the underlying dark matter distribution, gives a powerful test 
of hierarchical structure formation theory, especially when 
compared with fluctuations in the Cosmic Microwave Back- 
ground. Indeed, the results show excellent agreement with the 
now-standard flat A-dominated concordance cosmology (e.g., 
Spergel et al. 2003, 2006; Tegmark et al. 2004, 2006; Eisen- 
stein et al. 2005; Percival et al. 2006). The galaxy two-point 
correlation function is well-fit by a power law: £(r) = (r/ ro)~ 7 
on scales r < 20 h Mpc, with comoving correlation length 
r — 5 h~ l Mpc and slope 7 — 1.8 (Totsuji & Kihara 1969; 
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Groth & Peebles 1977; Davis & Peebles 1983; Hawkins et al. 
2003), although there is an excess above the power law be- 
low 2 h~ l Mpc, thought to be due to halo occupation effects 
(Zehavi et al. 2004, 2005). 

At high redshifts and earlier times, the dark matter cluster- 
ing strength should be weaker, but the first clustering studies 
of high-redshift galaxies with the Keck telescope (Cohen et al. 
1996; Steidel et al. 1998; Giavalisco et al. 1998; Adelberger 
et al. 1998) showed that galaxies at z > 3 show a similar co- 
moving correlation length to those of today, results that have 
since been confirmed with much larger samples (e.g., Adel- 
berger et al. 2005a; Ouchi et al. 2005; Kashikawa et al. 2006; 
Meneux et al. 2006; Lee et al. 2006; Quadri et al. 2006). This 
is indeed expected: high-redshift galaxies are thought to form 
at rare peaks in the density field, which will be strongly biased 
relative to the dark matter (Kaiser 1984; Bardeen et al. 1986); 
under gravitational instability, the bias of galaxies drops over 
time as a function of redshift (Tegmark & Peebles 1998; Blan- 
ton et al. 2000; Weinberg et al. 2004). 

Luminous quasars offer a different probe of the clustering 
of galaxies at high redshift. Powered by gas accretion onto 
central super-massive black holes (Salpeter 1964; Lynden- 
Bell 1969), quasars are believed to be the progenitors of 
local dormant super-massive black holes which are ubiqui- 
tous in the centers of nearby bulge-dominated galaxies (e.g., 
Kormendy & Richstone 1995; Magorrian et al. 1998; Yu 
& Tremaine 2002). Studies of the clustering properties of 
quasars date back to Osmer (1981); in general, quasars have a 
clustering strength similar to that of luminous galaxies at the 
same redshift (Shaver 1984; Croom & Shanks 1996; Porciani, 
Magliocchetti & Norberg 2004, hereafter PMN04; Croom et 
al. 2005). If the triggering of quasar activity is not tied to 
the larger-scale environment in which their host galaxies re- 
side, this is not a surprising result; quasars are interpreted 
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as a stochastic process through which every luminous galaxy 
passes, and therefore the clustering of quasars should be no 
different from that of luminous galaxies. Studies of the clus- 
tering of galaxies around quasars similarly find that quasar 
environments are similar to those of luminous galaxies (Ser- 
ber et al. 2006, and references therein), although evidence for 
an enhanced clustering of quasars on small scales (Djorgovski 
1991; Hennawi et al. 2006a; but see also Myers et al. 2006c) 
suggests that tidal effects within 100 kpc may trigger quasar 
activity. 

A number of studies have examined the redshift evolution 
of quasar clustering, but the results have been controversial: 
some papers conclude that quasar clustering either decreases 
or weakly evolves with redshift (e.g., Iovino & Shaver 1988; 
Croom & Shanks 1996), while others say that it increases with 
redshift (e.g., Kundic 1997; La Franca et al. 1998; PMN04; 
Croom et al. 2005). Myers et al. (2006a, b, c) examined 
the clustering of quasar candidates with photometric redshifts 
from the SDSS; they find little evidence for evolution in clus- 
tering strength between z ~ 2 and today. These studies also 
find little evidence for a strong luminosity dependence of the 
quasar correlation function (e.g., Croom et al. 2005; Connolly 
et al., in preparation), which is in accord with quasar models 
in which quasar luminosity is only weakly related to black 
hole mass (Lidz et al. 2006). 

The vast majority of quasars in flux-limited samples like 
the SDSS (and especially UV-excess surveys like the 2dF 
QSO Redshift Survey; Croom et al. 2004) are at relatively 
low redshift, z < 2.5. More distant quasars are intrinsically 
rarer (e.g., Richards et al. 2006), and at a given luminosity 
are of course substantially fainter. However, we might expect 
high-redshift quasars to be appreciably more biased than their 
lower-redshift counterparts. The high-redshift quasars in flux- 
limited samples are intrinsically luminous, and by the Ed- 
dington argument, are powered by massive (> 1O 8 M ) black 
holes. If the relation between black hole mass and bulge mass 
(Tremaine et al. 2002 and references therein), and by exten- 
sion, black hole mass and dark matter halo mass (Ferrarese 
2002) holds true at high redshift, then luminous quasars re- 
side in very massive, and therefore very rare halos at high 
redshift. Rare, many-er peaks in the density field are strongly 
biased (Bardeen et al. 1986). Thus detection of particularly 
strong clustering at high redshift would allow tests both of 
the relationship between quasars and their host halos, and 
the predictions of biasing models. The rarity of the halos in 
which quasars reside is of course related to the observed num- 
ber density of quasars and their duty cycle/lifetime, thus the 
quasar luminosity function and the quasar clustering proper- 
ties can be used to constrain the average quasar lifetime ?q 
(Haiman & Hui 2001; Martini & Weinberg 2001), or equiv- 
alently, the duty cycle: the fraction of time a supermassive 
black hole shines as a luminous quasar. 

Studies to date of the clustering of high-redshift quasars 
have been hampered by small number statistics. Stephens et 
al. (1997) and Kundic (1997) examined three z > 2.7 quasar 
pairs with comoving separations 5-10 h~ l Mpc in the Palo- 
mar Transit Grism Survey of Schneider et al. (1994), and esti- 
mated a comoving correlation length r$ ~ 17.5 ±7.5 /z _1 Mpc, 
which is three times higher than that of lower redshift quasars. 
Schneider et al. (2000) found a pair of z = 4.25 quasars in 
the SDSS separated by less than 2 h~ l Mpc; this single pair 
implies a lower limit to the correlation length of ro = 12 h 
Mpc. Similarly, the quasar pair separated by a few Mpc at 
z ~ 5 found by Djorgovski et al. (2003) also implies strong 



clustering at high redshift. However, measuring a true corre- 
lation function requires large samples of quasars. At z ~ 4, 
the mean comoving distance between luminous (M, < -27.6) 
quasars is ~ 150 h~ l Mpc (Fan et al. 2001; Richards et al. 
2006), thus to build up statistics on smaller-scale clustering in 
such a sparse sample requires a very large volume. The SDSS 
quasar sample is the first survey of high-redshift quasars that 
covers enough volume to allow this measurement to be made. 

This paper presents the correlation function of high redshift 
(z > 2.9) quasars using the fifth data release (DR5; Adelman- 
McCarthy et al. 2007) of the SDSS. DR5 contains ~ 6,000 
quasars with redshift z > 2.9. We construct a homogeneous 
flux-limited sample for clustering analysis in § [21 with spe- 
cial focus on redshift determination in Appendix [Al and the 
angular mask of the sample in Appendix [B] We present the 
correlation function itself in §|3] together with a discussion of 
its implications for quasar duty cycles and lifetimes. We con- 
clude in Section|4] Throughout the paper we use the third year 
WMAP + all parameters 1 1 (Spergel et al. 2006) for the cosmo- 
logical model: D, M = 0.26, fl A = 0.74, Q, b = 0.0435, h = 0.71, 
n s = 0.938, erg = 0.751. Comoving units are used in distance 
measurements; for comparison with previous results, we will 
often quote distances in units of h~ l Mpc. 

2. SAMPLE SELECTION 
2.1. The SDSS Quasar Sample 

The SDSS uses a dedicated 2.5-m wide-field telescope 
(Gunn et al. 2006) which uses a drift-scan camera with 30 
2048 x 2048 CCDs (Gunn et al. 1998) to image the sky in five 
broad bands (ugriz; Fukugita et al. 1996). The imaging data 
are taken on dark photometric nights of good seeing (Hogg 
et al. 2001), are calibrated photometrically (Smith et al. 2002; 
Ivezic et al. 2004; Tucker et al. 2006) and astrometrically (Pier 
et al. 2003), and object parameters are measured (Lupton et al. 
2001; Stoughton et al. 2002). Quasar candidates (Richards et 
al. 2002b) for follow-up spectroscopy are selected from the 
imaging data using their colors, and are arranged in spectro- 
scopic plates (Blanton et al. 2003) to be observed with a pair 
of double spectrographs. The quasars observed through the 
Third Data Release (Abazajian et al. 2005) have been cata- 
loged by Schneider et al. (2005), while Schneider et al. (2006) 
extend this catalog to the DR5. In this paper, we will use re- 
sults from DR5, for which spectroscopy has been carried out 
over 5740 deg 2 . Because of the diameter of the fiber cladding, 
two targets on the same plate cannot be placed closer than 55" 
(corresponding to ~ 1 .2 h~ l Mpc at z = 3) 12 ; the present paper 
therefore concentrates on clustering on larger scales, and we 
will present a discussion of the correlation function on small 
scales in a paper in preparation. 

The quasar target selection algorithm is in two parts: 
quasars with z < 3.5 are outliers from the stellar locus in the 
ugri color cube, while those with z > 3.5 are selected as out- 
liers in the griz color cube. The quasar candidate sample is 
flux-limited to i = 19.1 (after correction for Galactic extinc- 
tion following Schlegel, Finkbeiner, & Davis 1998), but be- 
cause high-redshift quasars are quite rare, the magnitude limit 
for objects lying in those regions of color space corresponding 
to quasars at z > 3 are targeted to i = 20.2. The quasar locus 
crosses the stellar locus in color space at z ~ 2.7 (Fan 1999), 
meaning that quasar target selection is quite incomplete there 

11 http://lambdu.gsfc.nasa.gov/product/map/current/paramsAcdm_uU.cfin 

12 Serendipitous objects closer than 55" might be observed on overlapped 
plates. 
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(Richards et al. 2006). For this reason, we have chosen to 
define high-redshift quasars as those with z > 2.9. 

We draw our parent sample from the SDSS DR5 catalog. 
We have taken all quasars with listed redshift z > 2.9 from 
the DR3 quasar catalog (Schneider et al. 2005); the redshifts 
of these objects have all been checked by eye, and we rec- 
tify a small number of incorrect redshifts in the database. 
This sample contains 3,333 quasars. In addition, we have 
included all objects on plates taken since DR3 with listed 
redshift z > 2.9 as determined either from the official spec- 
troscopic pipeline which determines redshifts by measuring 
the position of emission lines (SubbaRao et al. 2002) or an 
independent pipeline which fits spectra to quasar templates 
(Schlegel et al., in preparation). We examined by eye the 
spectra of all objects with discrepant redshifts between the 
two pipelines. There are 2,805 quasars added to our sample 
from plates taken since DR3. 

Quasar emission lines are broad, and tend to show system- 
atic wavelength offsets from the true redshift of the object 
(Richards et al. 2002a and references therein). Appendix lAl 
describes our investigation of these effects, determination of 
an unbiased redshift for each object, and the definition of our 
final sample of 6,109 quasars with z > 2.9 (after rejecting 29 
objects that turn out to have z < 2.9). 

2.2. Clustering Subsample 

Not all the quasars in our sample are suitable for a cluster- 
ing analysis. Here we follow Richards et al. (2006) and select 
only those quasars that are selected from a uniform algorithm. 
In particular: 

• The version of the quasar target selection algorithm 
used for the SDSS Early Data Release (Stoughton et al. 

2002) and the First Data Release (DR1; Abazajian et al. 

2003) did a poor job of selecting objects with z ~ 3.5. 
We use only those quasars targeted with the improved 
version of the algorithm, i.e., those with target selection 
version no lower than v3_l_0. 

• Some quasars are found using algorithms other than the 
quasar target selection algorithm described by Richards 
et al. (2002b), including special selection in the South- 
ern Galactic Cap (see Adelman-McCarthy et al. 2006) 
and optical counterparts to ROSAT sources (Anderson 
et al. 2003). The completeness of these auxiliary algo- 
rithms is poor, and we only include quasars targeted by 
the main algorithm. 

• Because quasars are selected by their optical colors, re- 
gions of sky in which the SDSS photometry is poor are 
unlikely to have complete quasar targeting. 

We now describe how the regions with poor photometry are 
identified. The SDSS images are processed in a series of 10' x 
\y fields. We follow Richards et al. (2006) and mark a given 
field has having bad photometry if any one of the following 
criteria is satisfied: 

• the r-band seeing is greater than 2".0; 

• The operational database quality flag for that field is 
BAD, MISSING or HOLE (only 0.15% of all DR5 
fields have one of these flags set); 

• The median difference between the PSF and large- 
aperture photometry magnitudes of bright stars lies 



more than 3a from the mean over the entire DR5 sam- 
ple in any of the five bands; 

• Any of the four principal colors of the stellar locus 
(Ivezic et al. 2004) deviates from the mean of the DR5 
sample by more than 3er; 

• Any of the four values of the rms scatter around the 
mean principal color deviates from the mean over DR5 
by more than 5<r, or, deviates from the DR5 mean by 
more than 2a, and also deviates from the mean of that 
run by more than 3a. This criterion reflects the fact 
that the statistics of the rms widths of the principal color 
distributions per field vary significantly from run to run. 

All the information we need to identify bad fields in this way 
can be retrieved from the runQA table in the SDSS Catalogue 
Archive Server (CAS 13 ). A total of 13.24% of the net area of 
the clustering subsample is marked as bad. These bad fields 
will serve as a secondary mask in our geometry description. 
We will compute the correlation function both including and 
excluding the bad regions, to test our sensitivity to possible 
selection problems in the bad regions. 

Finally, due to overlapping plates, there are roughly 200 
duplicate objects in our parent sample, which we identified 
and removed using objects' positions. 

Our final cleaned subsample contains 4,426 quasars before 
excluding bad fields and 3,846 quasars with bad fields ex- 
cluded. Thus 13.1% of high-redshift quasars are in bad fields, 
essentially identical to the fraction of the area flagged as bad, 
which suggests that the selection of quasars in these regions is 
not terribly biased. A list of the unique high-redshift quasars 
in our parent sample and in the subsample used in our cluster- 
ing analysis is provided in Table Q] 

2.3. Distribution of Quasars in Angle and on the Sky 

The footprint of our quasar clustering subsample is quite 
complicated. The definition of the sample's exact boundaries, 
needed for the correlation function analysis which follows, 
is described in detail in Appendix iBl Fig. Q] shows the area 
of sky from which the sample was selected in green, and the 
sample of quasars is indicated as dots, with red dots indicating 
objects in bad imaging fields. The total area subtended by the 
sample is 4041 deg 2 ; when bad fields are excluded, the solid 
angle drops to 3506 deg 2 . 

The target selection algorithm for quasars is not perfect and 
the selection function depends on redshift. Our sample is lim- 
ited to z > 2.9; at slightly lower redshift, the broad-band col- 
ors of quasars are essentially identical to those of F stars (Fan 
1999), giving a dramatic drop in the quasar selection function. 
Moreover, as discussed in Richards et al. (2006), quasars with 
redshift z w 3.5 have similar colors to G/K stars in the griz di- 
agram and hence targeting becomes less efficient around this 
redshift (as mentioned above, this problem was even worse for 
the version of target selection used in the EDR and DR1). This 
is reflected in the redshift distribution of our sample (Fig.©, 
which shows a dip at z ~ 3.5. We will use these distributions 
in computing the correlation function below. 

3. CORRELATION FUNCTION 

Now that we understand the angular and radial selection 
function of our sample, we are ready to compute the two-point 

13 |http : / / cas . sdss . org| 



4 Y. Shen et al. 



TABLE 1 
High redshift quasar sample 



Plate 


Fiber 


MJD 


RA (deg) 


DEC (deg) 




Zerr 


(' mag 


sub_flag 


good_ 


1091 


553 


52902 


0.193413 


1.239112 


3.741 


0.011 


19.74 








1489 


506 


52991 


0.214856 


0.200710 


3.881 


0.030 


19.97 








1489 


104 


52991 


0.397978 


-0.701886 


3.572 


0.008 


19.33 








0387 


556 


51791 


0.587972 


0.363741 


3.057 


0.010 


18.58 








0650 


111 


52143 


0.660070 


-10.197168 


3.942 


0.012 


19.97 








0750 


608 


52235 


0.751425 


16.007709 


3.689 


0.011 


19.50 


1 


1 


0650 


048 


52143 


0.763943 


-10.864079 


3.645 


0.011 


19.20 








0750 


036 


52235 


0.896718 


14.795454 


3.462 


0.012 


19.95 


1 


1 


0750 


632 


52235 


1.155146 


15.174562 


3.203 


0.009 


20.17 


1 


1 


0751 


207 


52251 


1.401625 


13.997071 


3.705 


0.011 


19.34 


1 


1 



flag 



NOTE. — The entire high redshift quasar sample with duplicate objects removed. The mb_fiag is 1 when an object is in the clustering subsample, and the 
good_fiag is 1 for objects lying in good fields. The (' magnitudes are SDSS PSF (asinh) magnitudes corrected for Galactic extinction (Schlegel et al. 1998); they 
use the ubercalibration described by Padmanabhan et al. (2007), which differs slightly from that used in the official DR5 quasar catalog (Schneider et al. 2007). 
The entire table is available in the electronic edition of the paper. 




FIG. 1 . — Aitoff projection in equatorial coordinates of the angular coverage of our clustering subsample (with all fields). The center of the plot is the direction 
RA = 120° and Dec = 0°. The dots indicate quasars in our c lustering subsample, with red dots indicating those in bad imaging fields. The angular coverage is 
patchy due to the various selection criteria described in j|2.2| and Appendix|B] For example, much of the Southern Equatorial Stripe (8 = 0, 300 < a < 60°) was 
targeted using the old version of the quasar targetting algorithm. 



correlation function. Doing so requires producing a random 
catalog of points (i.e., without any clustering signal) with the 
same spatial selection function. We will first compute the cor- 
relation function in "redshift space" in § 13.11 then derive the 
real-space correlation function in § !3.2l bv projecting over red- 
shift space distortions. Our calculations will be done both in- 
cluding and excluding the bad fields (§ 12.21 ); we will find that 
our results are robust to this detail. 



3.1. "Redshift Space " Correlation Function 

We draw random quasar catalogs according to the detailed 
angular and radial selection functions discussed in the last 
section. 

We start by computing the correlation function in "redshift 
space", where each object is placed at the comoving distance 
implied by its measured redshift and our assumed cosmology, 



with no correction for peculiar velocities or redshift errors 14 . 
The correlation function is measured using the estimator of 
Landy&Szalay (1993) 15 : 



(DD)-2(DR) + (RR) 
(RR) 



(1) 



where (DD), (DR), and (RR) are the normalized numbers 
of data-data, data-random and random-random pairs in each 
separation bin, respectively. The results are shown in Fig. [3] 
where we bin the redshift space distance s in logarithmic in- 
tervals of Alog 10 s = 0.1. We tabulate the results in Table[2] 
There are various ways to estimate the statistical errors in 

14 All calculations in this paper are done in comoving coordinates, which 
is appropriate for comparing clustering results at different epochs on linear 
scales. On very small, virialized scales, Hennawi et al. (2006a) argue that 
proper coordinates are more appropriate for clustering analyses. 

5 We found that the Hamilton (1993) estimator gives similar results. 



Quasar Correlation Function at z > 2.9 



5 




3.0 3.5 4.0 4.5 5.0 5.5 



FIG. 2. — Observed redshift distribution of our quasar clustering subsam- 
ples, normalized by the peak value. This distribution is the product of the 
evolution of the quasar density distribution, and the quasar selection func- 
tion; the latter is responsible for the dip at z ~ 3.5, where quasars have very 
similar colors to those of G and K stars. We show the redshift distributions 
for the subsamples both including and excluding bad fields; the results are 
essentially identical. The redshift binning is Az = 0.05. 



TABLE 2 

Redshift space correlation function £ s (s) 



i (ft -1 Mpc) 




RRmean 


DR-mean 


6 


f s error 


2.244 


0.0 


0.9 


0.0 






2.825 


0.0 


5.4 


0.0 






3.557 


0.0 


6.3 


0.0 






4.477 


1.8 


14.4 


0.9 


16.5 


12.8 


5.637 


0.0 


34.2 


3.6 






7.096 


1.8 


38.7 


11.7 


3.54 


3.61 


8.934 


1.8 


99.0 


18.0 


1.26 


1.88 


11.25 


2.7 


215.0 


36.9 


0.663 


0.733 


14.16 


4.5 


406.5 


80.0 


0.191 


0.786 


17.83 


8.9 


804.2 


162.4 


0.131 


0.472 


22.44 


15.2 


1592.4 


279.4 


0.236 


0.175 


28.25 


22.4 


3123.6 


607.3 


-0.280 


0.223 


35.57 


70.7 


6028.6 


1139.3 


0.361 


0.170 


44.77 


104.9 


11959.1 


2137.1 


0.101 


0.121 


56.37 


210.9 


23480.2 


4381.2 


0.0384 


0.0862 


70.96 


384.8 


45648.7 


8239.8 


0.0368 


0.0644 


89.34 


734.2 


88337.9 


16036.1 


0.0101 


0.0382 


112.5 


1417.1 


168480.9 


30636.2 


0.0194 


0.0250 


141.6 


2565.8 


317727.8 


57230.3 


-0.00396 


0.0219 


178.3 


4821.6 


588892.8 


106083.7 


0.0101 


0.0134 


224.4 


8631.8 


1070807.1 


192603.7 


-0.00296 


0.00672 


282.5 


15376.1 


1912774.1 


342706.1 


0.00214 


0.00953 



NOTE. — Result for all fields. DD mca n, RRmean and DR raca n are the mean 
numbers of quasar-quasar , random-random and quasar-random pairs within 
each s bin for the ten jackknife samples. is the mean value calculated 
from jackknife samples, and the error quoted is that from the jackknifes as 
well. 



the correlation function (e.g., Hamilton 1993), including boot- 
strap resampling (e.g., PMN04), jackknife resampling (e.g., 
Zehavi et al. 2005), and the Poisson estimator (e.g., Croom 
et al. 2005; da Angela et al. 2005). In this paper we will fo- 
cus on the latter two methods. For the jackknife method, we 
split the clustering sample into 10 spatially contiguous sub- 
samples, and our jackknife samples are created by omitting 
each of these subsamples in turn. Therefore, each of the jack- 
knife samples contains 90% of the quasars, and we use each to 
compute the correlation function. The covariance error matrix 



is estimated as 

N I N 

Covte,6) = -^£$-6)(3-6) , (2) 
i=\ 

where N = 10 in our case, the subscript denotes the bin num- 
ber, and is the mean value of the statistic £, over the jack- 
knife samples (not surprisingly, we found that was very 
close to the correlation function for the whole clustering sam- 
ple, for all bins i). Our sample is sparse, thus the off-diagonal 
elements of the covariance matrix are poorly determined, so 
we use only the diagonal elements of the covariance matrix 
in the x 2 fits below. We also carried out fits keeping those 
off-diagonal elements for adjacent and separated-by-two bins, 
and found similar results. 

For the Poisson error estimator (e.g., Kaiser 1986), valid 
for sparse samples in which a given quasar is unlikely to 
take part in more than one pair, the error is estimated as 
= (1 +6)/v /M i n (-Wpair,-NQso), where N pail is the number 
of unique quasar-quasar pairs in our real quasar sample in the 
bin in question, and Nqso is the total number of real quasars in 
our sample (e.g., da Angela et al. 2005). The Poisson estima- 
tor breaks down on large scales, as the pairs in different bins 
become correlated. Fig.[3]shows the two error estimators; the 
two methods give similar results. 

The correlation function lies above unity for scales below 
~ 10 h~ l Mpc; it is clear that the clustering signal is much 
stronger than that of low-redshift quasars (e.g., Croom et al. 
2005; Connolly et al. 2006). Fig. [3] also shows the results of 
a x 2 fit °f a power-law correlation function ^ s (s) = (s/so)~ s to 
the data with 4 < s < 150 h Mpc. The clustering signal is 
negative in the s = 28.25 h _1 Mpc bin; Table|2]shows a smaller 
number of quasar-quasar pairs than expected. This point ap- 
pears to be an outlier, as the expected correlation function 
should be positive on these scales; this discrepancy may be 
due to the paucity of quasars in the sample at z ~ 3.5. We 
have carried out fits to £ s {s) both including and not including 
this data point (Table 0); we find it makes little difference. 
In particular, neglecting the point at 28.25 h~ l Mpc, we find 
s = 10.2±3.1 h~ l Mpc and 8 = 1.71 ±0.43 for the Poisson 
errors, and s = 10.4 ± 3.0 h~ l Mpc and 8 = 1.73 ± 0.46 for 
the jackknife method. When we include this negative data 
point, we find sq = 10.4 /r'Mpc and S = 2.07 for the jack- 
knife method. Table|4]also includes the x 2 /dof for these fits; 
in all cases, it is less than unity, due to our neglecting the 
off-diagonal elements in the covariance matrix. However, as 
Figure [3] makes clear, the majority of the points lie within 1 
sigma of the fitted power law. 

Using good fields only yields similar results for bins where 
there are more than 20 real quasar pairs (i.e., s > 20 h~ l Mpc). 
On scales below 20 h~ l Mpc there are very few quasar-quasar 
pairs in each bin, and the signal-to-noise ratio is very low. 
The fitting results (over scale range 4 < s < 150 h~ l Mpc) are: 
s = 12.7 ±3.3 h~ l Mpc and 8 = 1.64±0.31 for the Poisson 
errors; s = 10.3 ± 3.0 /r 1 Mpc and 8 = 1.43 ± 0.28 for the 
jackknife errors. 

To study the large scale behavior of t; s (s) we compute !; s (s) 
up to s = 2000 h~ l Mpc on a linear grid with A* = 20 ft -1 
Mpc, using all the fields. The result is shown in Fig. |4] and 
errors are estimated using the Poisson estimator. For scales 
200 < s < 2000 h~ x Mpc, the mean value of &(s) is 0.002, 
with an rms scatter of ±0.01 (see also Roukema, Mamon & 
Bajtlik 2002 and Croom et al. 2005). Thus there is no clear 
evidence for correlations on scales above 200 h~ l Mpc. 
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FIG. 3. — Redshift space correlation function £.[(.?) for quasars with z > 2.9 (all fields included). Statistical errors are estimated using the Poisson estimator 
(left) and jackknife estimator (right). The two estimators give comparable results. Also plotted are the best fitted power-law functions, with fitted parameters 
listed in Tableg] 
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FIG. 4. — Large scale behavior of £. t (.s) for the z > 2.9 quasars (all fields included). Errors are estimated using the Poisson estimator. The redshift space 
correlation function essentially vanishes after s > 200 h Mpc, with a mean of 0.002 and rms scatter ±0.01 in the range 200 < 5 < 2000 hT l Mpc. 



3.2. The Real Space Correlation Function 

Appendix [A] shows that the uncertainty in measurements 
of the quasar redshifts is substantial, Az « 0.01, giving an 
uncertainty in the comoving distance of a z = 3.5 quasar of 
~ 6 h~ l Mpc. This, together with peculiar velocities on large 
and small scales systematically bias the correlation function 
(e.g., Kaiser 1987). To determine the real-space correlation 
function, we follow standard practice and compute the corre- 
lation function on a two-dimensional grid of pair separations 
parallel (tt) and perpendicular (r p ) to the line of sight. Our 
grid has a logarithmic increment of 0. 15 along the r p direction 
and a linear increment of 5 h~ l Mpc along the tt direction. As 
above, the two dimensional correlation function 7r) is 
estimated using the Landy & Szalay (1993) estimator, equa- 
tion ((T). Redshift errors and peculiar velocities affect the sep- 
aration along the tt direction but not along the r p direction. 



Therefore we project out these effects by integrating £,(/>, 71") 
along the tt direction to obtain the projected correlation func- 
tion w p (r p ): 



nOC 

w p (r p ) = 2 / d7T£ s (r p ,TT) 
Jo 



(3) 



In practice we integrate up to some cutoff value of 7r cu toff = 
100 h~ l Mpc, which includes most of the clustering signal, 
without being dominated by noise. This value of 7r cut0 ff is 
larger than the values of 40-70 h~ l Mpc typically used in 
clustering analyses for galaxies and low-redshift quasars (e.g., 
Zehavi et al. 2005, PMN04, da Angela et al. 2005) because of 
the substantially stronger clustering of high-redshift quasars. 
We verify that our results are not sensitive to the precise value 
of 7r cut0 ff we adopt. 
The projected correlation function w p is related to the real- 



Quasar Correlation Function at z > 2.9 



7 



1000.000 



1 00.000 r 



10.000 r 



1.000 r 



0.100 r 



0.010 r 



0.001 



all fields 



1000.000 



100.000 r 



1 0.000 r 



1 .000 r 



0.1 00 r 



0.010 r 



10 1 00 

r (h _1 Mpc) 



0.001 



good fields 



10 100 
r (h~ 1 Mpc) 



FIG. 5. — Projected correlation function w p (r p ) for the z > 2.9 quasars. Errors are estimated using the jackknife method. Also plotted are the best fitted 
power-law functions, with fitted parameters listed in Tablef4] left, for all fields; right: for good fields only. The two cases give similar results. 
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FIG. 6. — Correlation functions of 23,283 0.8 < z < 2. 1 SDSS DR5 quasars in all fields. Errors are estimated using the jackknife method, left: redshift space 
correlation function; right: projected correlation function. Also plotted are the best fitted power-law functions, with fitted parameters listed in Tablef4] 



space correlation function £(r) through 

r£(r) 



w p (r p ) = 2 



( r 2_ r 2)l/2 



dr 



(4) 



(5) 



(e.g., Davis & Peebles 1983). 
If £(r) follows the power-law form £(r) = (r/ro)~ 7 , then: 

M^) _ r(l/2)r[( 7 -l)/2] frp 

r P 1X7/2) V P , 

We show our results for w p (r p ) in Fig. [5] where the errors 
are estimated using the jackknife method. Tabulated values 
for w p are listed in Table [3] for the all-fields case. We only 
use data points where the mean number of quasar-quasar pairs 
in the r p bin is more than 10, and we therefore restrict our 



fits to scales 4 < r p < 150 h 1 Mpc. The parameters of the 
best-fit power-law for the all-fields case is ro = 16. 1 ± 1 .7 h~ l 
Mpc and 7 = 2.33 ±0.32 when the negative data point at 
r p = 18.84 h~ l Mpc is excluded. When this negative data 
point is included in the fit we get ro = 13.6 ± 1.8 h~ l Mpc and 
an unusually large 7 = 3.52 ± 0.87, which is caused by the 
drag of the negative point on the fit 16 . Using good fields only 
yields r Q = 15.2 it 2.7 h~ l Mpc and 7 = 2.05 ±0.28, shown 
in the lower panel of Fig. [5] Note that the real-space corre- 
lation function indicates appreciably stronger clustering than 
does its counterpart in redshift space; the large redshift errors 
spread structures out in redshift space, diluting the clustering 

16 For the good-fields case the projected correlation function is positive 
over the full range that we fit. 
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Projected correlation function w p (r p ) 



FIG. 7. — Clustering evolution of high redshift quasars. Errors are estimated using the jackknife method. Black indicates the 2.9 < z < 3.5 bin and red indicates 
the z > 3.5 bin. Also plotted are the best fitted power-law functions, with fitted parameters listed in Table|4] left: all fields; right: good fields only. Both cases 
show stronger clustering in the higher redshift bin. 

(2005), PMN04 and da Angela et al. (2005) based on the 2QZ 
sample, and Connolly et al. (2006) based on the SDSS sam- 
ple. Note that the 2QZ papers use a slightly different cosmol- 
ogy, which causes very little difference. More importantly, 
the 2QZ sample is at lower mean luminosity than the SDSS 
sample, although there is only a mild luminosity dependence 
of the clustering strength (e.g., Lidz et al. 2006; Connolly et 
al. 2006). We note that the amplitude of w p (r p ) for r p > 30 
h~ l Mpc is lower than predicted from the power-law fit, which 
is also the case in da Angela et al. (2005, Fig. 2). 

The predicted correlation function of the underlyi ng d ark 
matter at r = 15 h~ l Mpc is — 0.014 at z= 3.5 (see CO and 
Appendix0, far below that of the current high redshift quasar 
sample (Pig. [5j, indicating that our high-redshift quasar sam- 
ple is very strongly biased. 

The increase in clustering signal with redshift we have seen 
suggests that we may be able to see redshift evolution within 
our sample. We divide our clustering sample into two sub- 
samples with redshift intervals 2.9 < z < 3.5 and z > 3.5. The 
resulting w p (r p ) are shown in Fig. [7] The higher redshift bin 
shows systematically stronger clustering than does the lower 
redshift bin. The fitted parameters are: ro = 16.0 ± 1.8 h 
Mpc and 7 = 2.43 ±0.43 for 2.9 < z < 3.5; and r = 22.5 ±2.5 
/T 1 Mpc and 7 = 2.28 ±0.31 for z > 3.5, where the fitting 
range is 4 ~ 150 h~ l Mpc. Using good fields only yields: 
17.9 ± 1 .5 h~ l Mpc and 7 = 2.37 ± 0.29 for 2.9 < z < 3.5; 
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the mean numbers of quasar-quasar, random-random and quasar-random 
pairs within each r p bin for the ten jackknife samples. w p (r p )/r p is the 
mean value calculated from the jackknife samples. 

signal. 

We have already indicated that the clustering signal is ap- 
preciably stronger than at lower redshift. To check that this 
was not somehow an artifact of our processing we selected 
a sample of 23,283 spectroscopically confirmed quasars with 
0.8 < z < 2.1 from the SDSS DR5, with the same selection 
criteria as we used above (§ 12.21 1. Figure [6] shows the re- 
sulting £ s (s) and w p (r p ); to compare with the results of other 
authors (e.g., da Angela et al. 2005; Connolly et al. 2006), 
we integrated to 7r cuto ff = 70 h~ l Mpc. We fit power-laws 
over the range 1 < s < 100 h~ l Mpc (Croom et al. 2005) 
for £ s (s), and 1.2 < r p < 30 h~ l Mpc for w p (r p ) (PMN04 
and da Angela et al. 2005). The fitted power-law parame- 
ters are: s = 6.36 ±0.89 h~ l Mpc and 5 = 1.29 ±0.14 for 
&Cs); r = 6.47 ± 1 .55 h~ l Mpc and 7 = 1 .58 ± 0.20 for w p (r p ). 
These results are in excellent agreement with Croom et al. 



r = 25.2 ± 2.5 hr 1 Mpc and 7 = 2. 14 ± 0.24 for z > 3 .5. When 
we fix the power-law index to be 7 = 2.0 we get slightly dif- 
ferent but consistent correlation lengths for each case (Table 
2|. Indeed, the clustering of quasars increases strongly with 
redshift over the range probed by our sample. 

The increase in clustering strength with redshift may be due 
to two effects: an ever-increasing bias of the halos hosting 
quasars with fixed luminosity with redshift, and luminosity- 
dependent clustering. The higher-redshift quasars are more 
luminous (Tableland Fig. 17 of Richards et al. 2006), and 
may be associated with more massive haloes. At low red- 
shift (z < 3) and moderate luminosities, luminosity depends 
on accretion rate as much as black hole mass, and one expects 
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TABLE 4 

Summary of the fitting parameters of the correlation function 



redshift 


case 




s /r (h 1 Mpc) 




x 2 / dof 


Vo (5, 7 = 2.0) 


x 2 /dof 
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(r/r )-~< 


16.02 ± 1.81 


2.43 ±0.43 


0.43 
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0.52 




good, jackknife 




17.91 ±1.51 


2.37 ±0.29 


0.46 


16.90± 1.73 


0.56 


z> 3.5 


all, jackknife 


(r/r )--> 


22.51 ±2.53 


2.28 ±0.31 


0.50 


20.68 ±2.52 


0.52 




good, jackknife 


{r/roV 


25.22 ±2.50 


2. 14 ±0.24 


0.32 


24.30 ±2.36 


0.32 


0.8<z<2.1 


all, jackknife 




6.36 ±0.89 


1.29 ±0.14 


0.88 








all, jackknife 


(r/r )-> 


6.47 ± 1.55 


1.58 ±0.20 


0.88 







NOTE. — Fitting results for various cases and different redshift bins. The case column indicates whether the correlation function is measured from all fields 
or from good fields only; it also indicates the error estimator. £ s (.v) is the redshift space correlation function, while £(r) is the real space correlation function. The 
last two columns give the correlation length and reduced \ 2 f° r the fixed power-law index fits for selected cases. 
a Data points with negative correlation function are included in the fit. 



little dependence of clustering strength on luminosity (Lidz 
et al. 2006), as observed (Croom et al. 2005; Connolly et al. 
2006). However, the high-luminosity high redshift quasars in 
our sample have close to Eddington luminosities (Kollmeier 
et al. 2006), and therefore we may well expect a strong de- 
pendence of the clustering signal on luminosity (Hopkins et 
al. 2006). We are limited by the relatively small size of our 
sample to date, and will explore the dependence of clustering 
strength with luminosity in a future paper. 

Figure [8] shows the evolution of comoving correlation 
length ro as a function of redshift, where the data points for 
low redshift bins (gray triangles) are taken from Porciani & 
Norberg (2006, the 2QZ sample). Data points for the SDSS 
quasar sample in this paper are denoted as filled squares, 
placed at the mean redshifts for each redshift bin. The black 
square is for the 0.8 < z < 2.1 SDSS quasars, taken from the 
variable power-law index fit; the red and green squares are for 
the 2.9 < z < 3.5 bin and the z > 3.5 bin (with 7 fixed to 2.0), 
both for the all fields case and the good fields case. There are 
many factors that affect the fitted value of ro: the 2QZ and 
the SDSS samples probe different luminosities, the range of 
scales over which the power law is fit are different, and the 
power-law indices 7 are different. Nevertheless, this figure 
demonstrates that the clustering length of quasars increases 
dramatically with redshift. 

3.3. Quasar Lifetime, Halo Mass, and Bias 

The clustering of quasars and their space density can be 
used to constrain the average quasar lifetime fQ 17 and the bias 
of the dark matter halos in which they sit (Martini & Wein- 
berg 2001; Haiman & Hui 2001). In this section, we follow 
Martini & Weinberg (2001); the essential formulas are pre- 

17 Here we define (q to be the total time that an accreting supermassive 
black hole has a UV luminosity above the luminosity threshold of our sample. 
If the black hole is as old as its host dark matter halo, then the duty cycle 
'q/'h is the probability that we observe a quasar in this halo. Indeed, while 
the equations in Appendix [C] show that the directly constrained quantity is 
the duty cycle, the quantity ?q indicates how much time a supermassive black 
hole spends during the luminous accretion phase as it assembles most of its 
mass. 
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FIG. 8. — The evolution of the comoving correlation length ro as a function 
of redshift. Gray triangles are 2QZ data points taken from Porciani & Norberg 
(2006, Column 7 in their table 3). The black square is for the 0.8 < z < 2.1 
SDSS quasars, taken from the variable power-law index fit; the red and green 
squares are for the 2.9 < z < 3.5 bin and the z > 3.5 bin for the all fields for 
the good fields cases respectively, taken from the fixed 7 = 2.0 fits. 



sented in Appendix |Cl The basic assumptions are that: 1) 
luminous quasars only reside in dark matter halos with mass 
above some threshold mass M m ; n ; 2) those dark matter ha- 
los with M > M ra in host at most one active quasar at a time. 
The probability that such a halo harbors an active quasar is 
the duty cycle fQ/?H, where ?h is the halo lifetime, given by 
eqn. dC6t . Assumptions (1) and (2) include the assumption 
that every dark matter halo harbors a supermassive black hole, 
either active or dormant, and that the resulting quasars have 
the same clustering strength as their hosting halos. 

We note that the Martini & Weinberg approach is appropri- 
ate for high redshift quasars because at low redshift (z < 2), 
the occurrence of quasar activity is determined by fuelling as 
well, rather than by the mere existence of a dark matter halo. 
Therefore the probability that a halo harbors an active quasar 
is the duty cycle tQ/tu times the (unknown) probability that a 
halo harbors an active or dead quasar. 

The value of M m ; n (z) is related to the quasar lifetime and 
the observed quasar spatial density $(z) integrated over the 
survey magnitude range (having corrected for the selection 
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function, of course): 
$(z) = 



dM-^- 

Mmm t H (M,z) 



n(M,z) 



(6) 



where we set the duty cycle tQ/tu equal to unity in the inte- 
gration when ?q > f H , and n{M,z) is the dark matter halo mass 
function. Here, we follow Sheth & Torman (1999) to compute 
n(M,z). Given $(z) and assumed constant fQ, we can deter- 
mine M m in(z) from equation© and hence the effective bias 
beff (M m i n , z) from equation dC8t . for which we have used the 
analytical bias formalism in Jing (1998). We have checked 
the accuracy of the analytical bias model using the results of a 
cosmological N-body simulation by Paul Bode and Jeremiah 
P. Ostriker (Bode 2006, private communication). At the simu- 
lation output redshifts, z = 3 and z = 4, the bias factor depends 
on scale. However, we will integrate over a range of scales 
(see Eq. |7]below), the scale-independent analytical bias for- 
malism provides an adequate prescription (see further discus- 
sion in Appendix let. More importantly, the analytic form al- 
lows us to interpolate the bias with redshift, which is needed 
to predict the observed correlation function (equation IClll l. 
Fig. [9] shows n(M,z), tu(M,z) and fo e ff(M,z) as functions of 
halo mass M (in units of h~ l Mq) at redshift z = 3, 3.5, and 4 
for our standard cosmology. 

We compute the model predicted quasar correlation func- 
tion £ m0 dei(>",z) = bl n £ m {r,z) in steps of 0.1 in redshift, and 
integrate it to obtain the averaged correlation function £(r) 
over some redshift range via equation dCl U . £(r) is to be 
compared with our measured correlation function £(r). We 
iterate until we find a proper fQ to minimize the difference 
between £(r) and £(r). In practice, to compare the data and 
the model, we use the integrated correlation function within 
[rmi„, r max ] h~ x Mpc, defined as 



6o = 



(7) 



where we choose r^ = 5 h Mpc to minimize nonlinear ef- 
fects and r max = 20 h Mpc to maximize signal-to-noise ra- 
tio; within this range of scales, the model predicted and mea- 
sured correlation functions are well approximated by a single 
power-law. If we assume £(r) = (r/ro)~ 7 , equation (|7]l reduces 
to 

T..7 



20 : 



3rJ 



(3-7)r; 



3 

max 



3-7 _ 
max 



) 



(8) 



Because the underlying dark matter correlation function 
within this scale range has a power-law index close to 2.0, 
we adopt values from the fixed 7 = 2.0 fitting results in Ta- 
ble |4] instead of the variable power-law index fitting results. 
Hence we have £20 = 1 -230 ± 0.353 for the 2.9 < z < 3.5 bin 
and £20 = 2.406 ±0.586 for the z > 3.5 bin, here using the 
results from all fields. 

Our adopted values of $(z) are taken from the Maximum- 
Likelihood fitted quasar luminosity function (LF) with vari- 
able power law index given by Richards et al. (2006), in- 
tegrated from the faintest /-band magnitude i = 20.2. That 
paper uses a slightly different cosmology from our own; we 
correct by the ratio of comoving volume elements. Fig. 20 of 
Richards et al. (2006) shows that the functional fit we're using 
here doesn't perfectly follow the data, giving values of $(z) as 
much as a factor of 1 .5 off from the actual value; in particular, 
the variable power law fit function in Richards et al. (2006) 
appears to underestimate the value of <&(z) at z < 4.5 but over- 
estimate the value at z > 4.5 a little bit. This will probably 
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FIG. 9. — The Sheth & Tormen (1999) halo mass function, halo lifetime 
and effective bias factors for halos with M > M ra j n as functions of halo mass 
for three redshifts z = 3, 3.5, 4, in our fiducial cosmology. The age of the 
universe at these three redshifts is 2.2, 1.9, and 1.6 Gyr, respectively, and 
for typical halos with a mass of a few X 10 12 /t'Mq, the halo lifetime is 
approximately 0.7 ~ 1 Gyr at these redshifts. 

cause slight underestimation and overestimation of fQ (Eq.|6]l 
for the lower and higher redshift bins respectively, but the ef- 
fect is tiny compared with other uncertainties. Table [6] lists 
the values of <l>(z) we have calculated, along with other quan- 
tities. The limiting absolute /-band magnitude at each redshift 
is calculated using the same cosmology and K-correction as in 
Richards et al. (2006), normalized to z = 2. One subtlety is that 
quasars at z < 3.0 are close to the color cut at which the mag- 
nitude limit of the quasar sample changes between i = 19.1 
and 20.2 (see Fig. 17 of Richards et al. 2006). To account for 
this effect, we use 3 times the density down to / = 19. 1 for the 
redshift grid point at z = 2.9 and 4 times the density down to 
/ = 19.1 for the redshift grid point at z = 3.0; the grid points 
with z > 3.1 use the integrated luminosity function to / = 20.2 
(see fig. 17 of Richards et al. 2006). In practice, our results 
are insensitive to these details. 

To illustrate the relationship between fQ, b & ff, and M m ; n , we 
choose fixed values of fQ = 0.01, 0.1, 1 Gyr at each redshift 
and obtain the corresponding M m \ n and b e s at z = 3.0, 3.5, and 
4.0, listed in Table[5] Fig.flOlshows the evolution of the inte- 
grated quasar number density ^(z), M m j n (z) and £> e ff(z) for the 
three trial values of fQ. At each redshift we obtain the model 
predicted correlation function £ mo dei(r,z), which is then aver- 
aged over our sample redshift range weighted by the observed 
quasar distribution (not c orrected for the selection function) 
following equation ( IC1U . 

We compare the model predictions and measured values for 
the 2.9 < z < 3.5 andz > 3.5 redshift bins respectively. Fig. [Til 
plots the model predicted £20 as a function of fQ for the two 
redshift bins. Above fQ ~ 1 Gyr, the duty cycle saturates at 
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FIG. 10. — The top panel shows the integrated quasar luminosity function 
(LF) down to the magnitude cut i = 20.2, computed using the variable power- 
law tit function in Richards et al. (2006). The lower line segment shows the 
integrated LF down to i = 19.1. The bottom two panels show the computed 
minimum halo masses and effective bias factors as functions of redshift, for 
the three trial values of ?q = 0.01, 0.1 and 1 Gyr. We have used the empirical 
values of $ at the grid points z = 2.9, and 3.0 (i.e., three and four times the 
values down to i= 19.1, respectively), which causes the jump in M m \ n and fe c ff 
at these two redshift grid points, i.e., we are targeting more luminous quasars 
at z = 2.9, 3.0. The slight kink around z = 4.5 in all three panels is due to the 
K-correction (see figure 17 of Richards et al. 2006). 



TABLE 5 

Trial values of tQ at redshift z = 3.0, 3.5, 4.0 and the 

CORRESPONDING M min AND £> eff , ASSUMING THE FIDUCIAL ACDM 
COSMOLOGY. 



z 


* (h 3 Mpc" 3 ) 


tq (Gyr) 


M min (/r'M ) 


bes 


3.0 


5.591 X 10~ 7 


0.01 


2.33 x 10 12 


7.6 






0.1 


6.10 x 10 12 


9.8 






1 


1.32 x 10 13 


12.3 


3.5 


3.251 x 10~ 7 


0.01 


2.09 x 10 12 


9.0 






0.1 


4.98 x 10 12 


11.4 






1 


9.76 x 10 12 


13.9 


4.0 


1.009 x 10~ 7 


0.01 


2.29 x 10 12 


11.1 






0.1 


4.87 x 10 12 


13.7 






1 


8.41 x 10 12 


16.0 



unity, and the predicted correlation function flattens. The hor- 
izontal lines show the values and la errors of £20 computed 
using our fixed power-law fits, for the two redshift bins re- 
spectively. For the 2.9 < z < 3.5 bin, the estimated quasar 
lifetime is tQ ~ 15 Myr with lower limit 3.6 Myr and upper 
limit 47 Myr for the 1-er error of the measured £20- For the 
z > 3.5 redshift bin, the estimated quasar lifetime is tQ ~ 160 
Myr with lower limit ~ 30 Myr and upper limit ~ 600 Myr 
for the 1-er error of the measured £20- To phrase this in terms 
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FIG. 1 1 . — Comparison of the measured and model predicted clustering 
strength £20* defined in equation Q. Solid lines correspond to the 2.9 < 
Z < 3.5 bin and dashed lines correspond to the z > 3.5 bin. The thick and 
light horizontal lines show the measured clustering strength and 1 —a errors. 
The match of the model predicted £20 (blue lines for the fiducial <rg = 0.751 
and red lines for 0% = 0.84) with the measured £20 gives the average quasar 
lifetime fg within that redshift bin. The uncertainty in measured £20 gives a 
large uncertainty in ?q . Quasars in the higher redshift bin have larger (q on 
average. The fiducial values of tQ inferred from this figure (the erg = 0.751 
case) are: f = 15 Myr for 2.9 < z < 3.5 and (q = 160 Myr for z > 3.5. 

of the duty cycle, we take the average halo lifetime to be 1 
Gyr at these redshifts (see Fig. [9]). Therefore the duty cycle is 
0.004 - 0.05 for the lower redshift bin and 0.03 - 0.6 for the 
higher redshift bin. 

In the model we are using, t Q is very sensitive to the cluster- 
ing strength, as shown in Fig.Q~TJ A small change in the mea- 
sured quasar correlation function will result in a substantial 
change in ?q. Using different fitting results for the measured 
£20 (e-g-, those for good fields only) will certainly change the 
value of ?q. However, the formal 1— a errors of tQ are large 
enough to encompass these changes. The model is also sen- 
sitive to the adopted value of <j%, whose consensus value has 
changed significantly since the release of the WMAP3 data 
(Spergel et al. 2006). By increasing crs we can increase the 
model predicted £20 given the same f<j 18 . The results for the 
WMAP first year value cr 8 = 0.84 (Spergel et al. 2003) are also 
plotted in Fig.[TTJas red lines. In this case the tQ values are 
slightly lower for the two redshift bins, but are still within the 
l-<7 errors of the fiducial cs case. Combining these effects, 
we conclude that this approach can only constrain the quasar 
lifetime within a very broad range of 10 6 - 10 s yr, which is, of 
course, consistent with many other approaches (e.g., Martini 
2004 and references therein). On the other hand, our results 
do show, on average, a larger tQ and duty cycle for the higher 
redshift bin. 

There are other assumptions in our model that we should 
consider. In particular, there is the possibility that quasars 
cluster more than their dark matter halos due to physical ef- 

18 The £20 result is insensitive to other cosmological parameters such as 
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TABLE 6 

Quasar space density, M min and beg at each redshift grid 
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NOTE. — M, jj m j t is the i band limiting absolute magnitude, K-corrected to z = 2. is the integrated quasar number density over the apparent magnitude 
range, in the same cosmology as in Richards et al. (2006), converted using h = 0.7 to units of h 3 Mpc~ 3 . <P is the corresponding quasar number density in our 
cosmology, converted using A = 0.71 to h 3 Mpc~ 3 . There are three entries for each of the z = 2.9 and z = 3.0 grids, corresponding to a magnitude limit of i = 20.2 
(one asterisk), = 19.1 (two asterisks), and using the empirical values we adopted at these two redshift grids (see text; no asterisks). The apparent i-band limiting 
magnitude cut is i = 20.2 for z > 3.1. «qso is the observed overall quasar number density for all fields, in the current cosmology; the difference between jiqso 
and <P reflects the selection function and difference between the fitted power-law function and binned luminosity function. D(z) is the linear growth factor. Also 
tabulated are the corresponding minimal halo mass M mm and effective bias factors i> c ff at each redshift grid, computed using the fiducial values of ?q, i.e., ?q = 15 
Myr for 2.9 < z < 3.5 and f Q = 160 Myr for z. > 3.5. 



fects that modulate the formation of quasars on very large 
scales. For example, the process of reionization may show 
large spatial modulation, which might affect the number den- 
sity of young galaxies and quasars on large scales (e.g., 
Babich & Loeb 2006). We have also assumed that each halo 
hosts only one luminous quasar. However, Hennawi et al. 
(2006a) show that quasars (at lower redshift) are very strongly 
clustered on small scales, with some close binaries clearly in 
a single halo. Searches for multiple quasars at higher red- 
shift have also been successful (Hennawi et al., in prepara- 
tion), suggesting that at high redshift as well, a single halo 
can host more than one quasar. 

Table [6] uses the fiducial values of ?q we derived for the 
a = 0.751 case to estimate the minimal halo mass and bias 
factors of high redshift quasars, but the values of M m ; n and b e s 
depend only weakly on fQ, as one can see from Table [5] The 
values of M m [ n and b e ff are tabulated in Table|6] for each of the 
redshift bins. Note that the change of M m ; n within each red- 
shift bin may not be real because we have assumed constant fQ 
throughout the redshift bin. On the other hand, the host halos 
for the higher redshift bin have, on average, a larger minimal 
halo mass of ~ 4-6 x 10 12 Mq than that for the lower redshift 



bin of ~ 2-3 x 10 12 Mq. This is expected, because quasars 
in the higher redshift bin have higher mean luminosity and 
hence should reside in more massive halos. From Table [6] it 
is clear that high redshift quasars are strongly biased objects, 
and the effective bias factor increases with redshift. 

4. SUMMARY AND CONCLUSIONS 

We have used ~ 4000 high redshift SDSS quasars to mea- 
sure the quasar correlation function at z > 2.9. The clus- 
tering of these high redshift quasars is stronger than that 
of their low redshift counterparts. Over the range of 4 < 
r p < 150 h~ l Mpc, the real-space correlation function is fit- 
ted by a power-law form £(r) = (r/ro)~ 7 with ro ~ 15 h~ l 
Mpc and 7^2. When we divide the clustering ample into 
two broad redshift bins, 2.9 < z < 3.5 and z > 3.5, we find 
that the quasars in the higher redshift bin show substantially 
stronger clustering properties, with a comoving correlation 
length ro = 24.3 ±2.4 h~ l Mpc assuming a fixed power-law 
index 7 = 2.0. The lower redshift bin has a comoving cor- 
relation length ro = 16.9 ± 1.7 h Mpc, assuming the same 
power-law index. 

We followed Martini & Weinberg (2001) to relate this 
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strong clustering signal to the quasar luminosity function 
(Richards et al. 2006), the quasar lifetime and duty cycle, 
and the mass function of massive halos. We find the mini- 
mum mass M m i„ of halos in which luminous quasars in our 
sample reside, as well as the clustering bias factor for these 
halos. High redshift quasars are highly biased objects with 
respect to the underlying matter, while the minimal halo mass 
shows no strong evolution with redshift for our flux-limited 
sample. Quasars with 2.9 < z < 3.5 reside in halos with typ- 
ical mass ~ 2-3 x 10 12 /T 1 M Q ; quasars with z > 3.5 reside 
in halos with typical mass ~4-6 x 10 12 h~ l M©. The slight 
difference of M m ; n in the two redshift bins is expected because 
quasars in the higher redshift bin have mean luminosity that is 
approximately two times that of quasars in the lower redshift 
bin, and should reside in more massive halos. We further esti- 
mated the quasar lifetime fQ. We get a tQ value of 4 ~ 50 Myr 
for the 2.9 < z < 3.5 bin and 30 - 600 Myr for the z > 3.5 
bin; which is broadly consistent with the quasar lifetime of 
10 6 - 10 8 yr estimated from other methods (e.g., Martini 2004 
and references therein). This corresponds to a duty cycle of 
0.004 - 0.05 for the lower redshift bin and 0.03 - 0.6 for the 
higher redshift bin, where we take the average halo lifetime 
to be 1 Gyr. In general we find the average lifetime is higher 
for the higher redshift bin, which could either be due to the 
redshift evolution or an effect of the luminosity dependence 
of ?q. However, we emphasize that our approach is subject to 
a variety of uncertainties, including errors in the clustering 
measurements themselves, uncertainties in erg and the halo 
mass function, and the validity of the assumptions we have 
adopted. 

It is interesting to note that recent Chandra and XMM- 
Newton studies on the clustering of X-ray selected AGN have 
revealed a larger correlation length than optical AGN. In par- 
ticular, hard X-ray AGN have a correlation length ro ~ 15 h~ l 
Mpc at z < 2 (e.g., Basilakos et al. 2004; Gilli et al. 2005; 
Puccetti et al. 2006; Plionis 2006). Given the fact that X- 
ray selected AGN have considerably lower mean bolometric 
luminosity than do optically-selected AGN (e.g., Mushotzky 
2004), this implies, once again, that the instantaneous lumi- 
nosity is not a reliable indicator of the host halo mass at the 
low luminosity end (e.g., Hopkins et al. 2005). Shen et al. 
(2007) have suggested an evolutionary model of AGN accre- 
tion in which an AGN evolves from being dominant in the 
optical to dominant in X-rays when the accretion rate drops. 
Hence those strongly clustering hard X-ray AGN were proba- 
bly once very luminous quasars in the past with high peak lu- 
minosities. When they dim and turn into hard X-ray sources, 
their spatial clustering strength remains. However, the cur- 
rent X-ray AGN sample is still very limited compared with 
optically selected samples, hence the uncertainty in the X-ray 
AGN correlation length is large. 

The work described in this paper can be extended in a va- 
riety of ways. Our sample cannot explore clustering below 
~ 1 h~ l Mpc because of fiber collisions; we are extending 
the methods of Hennawi et al. (2006a) to find close pairs of 
high-redshift quasars, to determine whether the excess clus- 
tering found at moderate redshift extends to z > 3. Extend- 
ing the clustering analysis to lower luminosities will be im- 
portant, given theoretical predictions of a strong luminosity 
dependence to the clustering signal at high redshifts (Hop- 
kins et al. 2006). The repeat scans of the Southern Equatorial 
Stripe in SDSS (Adelman-McCarthy et al. 2007) will allow 
us to extend the luminosity range of our sample, and redshifts 
of the fainter quasars are already being obtained (Jiang et al. 



2006). The massive halos that we predict host the luminous 
quasars must also contain a substantial number of ordinary 
galaxies, and we plan deep imaging surveys of high-redshift 
quasar fields to measure the quasar-galaxy crosss-correlation 
function (see Stiavelli et al. 2005; Ajiki et al. 2006). Finally, 
more work is needed on simulations of quasar clustering. Our 
quasar lifetime/duty cycle calculation is frustratingly impre- 
cise, and further explorations of the behavior of highly biased 
rare halos at high redshifts may yield ways to constrain duty 
cycles more directly from the data, and understand the uncer- 
tainties of the technique in more detail. 

Finally, we need to make more detailed comparisons of 
high-redshift quasar clustering with that of luminous galax- 
ies at the same redshift. The duty cycle of quasars at these 
redshifts is a few percent at most, thus there is a population 
of galaxies with quiescent central black holes that is just as 
strongly clustered. The correlation length of Lyman-break 
galaxies at these redshifts is ~ 5/r'Mpc (Adelberger et al. 
2005a), but the clustering strength appears to increase (albeit 
at z ~ 2) with increasing observed K-band luminosity (Adel- 
berger et al. 2005b; Allen et al. 2005) and/or color (Quadri 
et al. 2006). The duty cycle we have calculated should agree 
with the ratio of number densities of luminous quasars, and 
that of the parent host galaxy population. The challenge will 
be to identify this parent population unambigously. 
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TABLE 7 
Emission line shifts 





Lya - SilV 


Lya-CIV 


SilV - Mgii 


CIV - Mgii 


CIII] - Mgii 


Mgii - [OIII] 


mean vel shift (km s ' ) 


-463 


-1478 


61 


921 


827 


-97 


a (km s ) 


1178 


1217 


744 


746 


604 


269 


y = ax+b 


a 




b(kms- 1 ) 




a (km s -1 ) 




CIV - Mgii vs. SilV - CIV 


-0.5035 




486.7 




660 




CIV - Mgii vs. CIII] - CIV 


-0.8024 




845.8 




594 




SilV - Mgii vs. SilV - CIII] 


0.6958 




596.5 




569 





NOTE. — The Mgii - [OIII] (i.e., systemic) lineshift and lc error are taken from Richards et al. (2002). Positive values indicate a blueshift. 
The dispersion of the shift between CIV and Mgii is somewhat larger than the value of 5 1 1 km s - ' quoted by Richards et al. (2002), but is consistent with their 
recent result using a much larger sample from SDSS DR4 (~ 770 km s~ x ). 



APPENDIX 

A. QUASAR REDSHIFT DETERMINATION 
A.l Broad Emission Line Shifts 

High redshift quasars (z > 2.9) have only a few strong emission lines that fall within the SDSS spectral coverage (3800-9200 
A): Lya (1216 A), Siiv/Oiv (1397 A), Civ (1549 A) and Cm] (1909 A). The Lya emission line is heavily affected by the Lyman 
a forest, and is blended with Nv 1240 A. In addition, high-ionization broad emission lines such as Civ are blueshifted by several 
hundred km s" 1 from the redshift determined from narrow forbidden lines like [OIII]5007 A (e.g., Gaskell 1982; Tytler & Fan 
1992; Richards et al. 2002a). We could simply correct the redshift derived from each observed line for the (known) mean offset of 
that line from systemic (e.g., Vanden Berk et al. 2001; Richards et al. 2002a). We can do better than this, however, by examining 
the relationships between the shifts of different lines. 

To understand these relationships, we use a sample of quasars drawn from the SDSS DR3 quasar catalog (Schneider et al. 
2005) with 1 .8 < z < 2.2; for these objects, the lines Siiv, Civ, Cm] and Mgn2800 A all fall in the SDSS spectral coverage. The 
Mgii line has a small and known offset from the systemic redshift (Richards et al. 2002a), thus tying our results to Mgii allows 
us to determine the systemic redshift for each object. We exclude from the sample those objects which show evidence for a broad 
absorption line, determined using the "balnicity" index (BI) of Weymann et al. (1991) and using the Vanden Berk et al. (2001) 
quasar composite spectrum to define the continuum level. 

We fit a log-normal to each of the four lines (with a second log-normal added for the neighboring lines Hen 1640 A and 
Aliul857 A), together with the local continuum. The centroid for each line is determined following Hennawi et al. (2006b): we 
calculate the mode of the pixels within ±1.5<r of the fitted Gaussian line center using 3 x median-2 x mean. We include in the 
mode calculation those pixels with flux: 

/a > ^L + Cx + y _^ e -ao gl „A-log 10 A,0V2^ , (Al) 

where A,-, log 1() A, and a-, are the amplitude, central wavelength, and dispersion of the best fit log-normal to the z'th emission line 
and C\ is the linear continuum. Lines with a signal-to-noise ratio (S/N) less than 6 per pixel, or with log-normal fits with x 2 > 5 
are rejected from further consideration. This gives us a sample of 1652 quasars with robust line measurements. Fig.Q~2]shows the 
distribution of shifts between various lines. The means and standard deviations of these distributions are given in Table [7] The 
contribution from the line fitting error is negligible compared to the "intrinsic" dispersion of velocity shifts. 

These line shifts are correlated with each other, as Fig.Q~3]shows. In each panel, we show the best-fit line to the correlations, 
giving each point equal weight. Given these correlations, we can use the shifts between the lines we observe at high redshift to 
determine the offset to Mgii, and thus to the systemic redshift. 

There are also correlations between the lineshifts and quantities such as the quasar luminosity, color, line width, and equivalent 
width. However, these correlations show large scatter, and are therefore not as good for determining the true redshifts of the 
quasars. 

A.2 Ly a- Siiv, Ly a- Civ Line Shifts 

The Civ line lies beyond the SDSS spectra for z > 4.9. In addition, some quasars have weak metal emission lines, which are of 
too low S/N to allow us to measure a redshift from them. In these cases, we will measure the redshift from the Lya line. In order 
to understand the biases that this gives, we selected a sample of 1 1 14 non-BAL quasars with 2.9 < z < 4.8 with high S/N Siiv 
and Civ lines. The center of the Lya line was taken to be the wavelength of maximum flux. To reduce the effects of fluctuations 
and strong skylines, we mask out 5-er outliers from the 20-pixel smoothed spectrum and the 5577 A skyline region (about 20 
pixels), and smooth the spectrum by 15 pixels before identifying the peak pixel; all spectra were examined by eye to confirm that 
we correctly identified the peak of Lya. 

Fig.[T4lshows the shifts between Lya and the Civ and Siiv lines as a function of redshift. The mean shift is ~ 500 km s" 1 , with 
a la scatter of 1200 km s" 1 for Lya-Siiv; and is ~ 1500 km s , with a lcr scatter of 1200 km s" 1 for Lya-Civ. This systematic 
offset is caused by absorption blueward of the Lya forest; over this redshift range, the increasing strength of the forest doesn't 
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FIG. 12. — Distributions of relative shifts of the modes of various emission lines, as measured for 1652 high S/N, non-BAL quasars with redshifts between 1.8 
and 2.2. The mean values and 1— <r deviations of these line shifts are listed in Table[7J 

cause an appreciable increase in the shift. The Lya line is blended with the Nv line, therefore whenever we use Lya as the only 
estimator for redshift, we examine the spectrum by eye to confirm that we have identified the correct line. 

A.3 Determination of Redshifts 

We are now ready to determine unbiased redshifts for our sample of z > 2.9 quasars. Given the first guess of the redshift of 
each object from Schneider et al. (2005) for those objects included in DR3, and from the two spectroscopic pipelines (§ 12. It , we 
fit the centroids of the Siiv, Civ and Cm] lines as we described above. 

For objects in which the centroids of all three lines are well-determined (we require that a line have a mean S/N per pixel > 4 
and reduced \ 2 < 10), we base the redshift on the centroid of Civ. We measure the shift between Civ and Siiv, and the shift 
between Civ and Cm], and determine from each the expected Civ-Mgn line shift using the correlations in Fig.[l3]and Table|7] 
We average these lineshifts together, and add on the small correction from Mgn to systemic given by Richards et al. (2002a); 
this gives our final Civ to systemic shift and hence the redshift. The uncertainty in these shifts gives rise to an uncertainty 
<7 V = 519 km s _I or a z = (1 +z)a v /c. 

For quasars with only two high S/N lines, we take Civ whenever we have it and Siiv when Civ is absent (we avoid using Cm] 
because it is often near the upper wavelength limit, 9200 A, of the SDSS spectra). Again, we use the correlations of Fig. Q~3] 
to compute the line shift relative to Mgn and therefore the shift relative to the systemic redshift. The velocity shift (relative to 
systemic) errors in this correction are: 713 km s" 1 if the two lines are Siiv and Civ; 629 km s" 1 if the two lines are Siiv and Cm], 
and 652 km s" 1 if the two lines are Civ and Cm]. For quasars with only one well-detected line, we use the average line shift, 
and use error transfer to determine the errors in the line shift relative to systemic. These errors are: 791 km s -1 for Siiv, 793 
km s -1 for Civ and 661 km s" 1 for Cm]. Finally, for those quasars with no well-detected metal lines, we use Lya to determine 
the redshift, using the average line shift relative to Civ and the corresponding l-a dispersion to compute the error: adding the 
uncertainties in the transformations in quadrature gives an error of 1453 km s" 1 . 

Finally, we examine the spectra of the following classes of objects by eye to check the redshift determinations: (1) the 407 
objects with |z;-z S y S | > 3cr ; , where z\ is the initial redshift from the DR3 QSO catalog or SDSS spectroscopic pipeline; z sys is 
our best estimation of redshift and a : is the estimated redshift error; (2) the 327 objects for which the redshift was based on Lya; 
and (3) serendipitously found ambiguous cases. Of the ~ 750 objects we inspected by eye, our redshift as determined above was 
superior to the value from Schneider et al. (2005) or the pipelines in 70% of the quasars; for 15%, at least one of the pipeline 
redshifts was correct and was therefore adopted, and for the remaining 15% (many of them are BAL), neither redshift was correct. 
In the latter case, we refit the redshift by hand, and assigned a redshift error a z between 0.01 and 0.05, depending on how messy 
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FIG. 13. — Correlations between various emission line shifts. Blue dots are data points and red lines are fitted linear functions. These correlations are used in 
our redshift estimation. The fitted linear parameters and 1— a deviations are listed in Table|7J 
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FIG. 14. — Relative shifts of Ly a versus SilV and CIV emission lines as a function of redshift. Red lines indicate the mean value of line shifts. The mean 
values of line shifts and 1— a deviations are listed in TablefTJ 



the spectrum was. There were 29 objects whose redshifts were undetermined, lay below 2.9, or were simply not quasars. Thus 
the parent sample, from which we will construct our clustering subsample, contains 6,109 objects (including ~ 200 duplicates). 

Finally, we compared the redshifts in our sample with the separately compiled DR5 quasar sample of Schneider et al. (2006). 
The difference in redshifts follows a Gaussian distribution with zero mean and a dispersion of 0.01, comparable to our estimated 
errors. 

B. SURVEY GEOMETRY 

SDSS spectroscopic targets are selected from the imaging data, and thus the spectroscopic footprint is a complicated combi- 
nation of the individual runs which make up the imaging data, and the circular 1.49° radius tiles on which spectroscopic targets 
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are assigned to fibers. Here we describe how this footprint is quantified. It will be useful in the following discussion to refer to 
Fig. Q3] Related discussions of the SDSS survey footprint in the context of galaxy samples may be found in Appendix A2 of 
Tegmarket al. (2004) and in Blanton et al. (2005). 

As described in York et al. (2000), each imaging run of the SDSS covers part of a strip; two adjacent strips make a filled stripe 
of width 2.5°. Spectroscopic targeting to define a set of tiles is done off contiguous pieces of stripes termed targeting chunks; the 
SDSS imaging never got so far ahead of the spectroscopy to allow a targeting chunk to work off more than one stripe at a time. 
The targeting in a given chunk all uses the same version of the target selection code (an important consideration for us, given 
the change in quasar target selection following DR1; § 12.2b . Each targeting chunk is bounded on the East and West by lines of 
constant /x (i.e., the SDSS great circle coordinate; see Pier et al. 2003), and, for stripes in the Northern Galactic Cap, they are 
bounded in the North-South direction by lines of constant r\ (i.e., the SDSS survey coordinate) if in the Northern stripes. Targeting 
chunks in the three stripes in the Southern Galactic Cap have no r\ boundary applied. Targeting chunks never overlap, therefore 
the union of targeting chunks defines the geometry of the targeting regions as a whole. Parameters defining the geometry of the 
targeting chunks can be found in a table called Chunk 19 in the CAS. 

As described by Blanton et al. (2003), targets in each chunk are assigned to tiles, and then to fibers within each plate. We 
first define tiling chunks (referred to as "tiling regions" by Blanton et al. 2003; Blanton et al. 2005), each of which is a set of 
non-overlapping tiling rectangles bounded by constant coordinates in different coordinate systems (all three types of coordinate 
systems, as well as the mixture of them are used in describing the tiling rectangles; and there is a flag indicating the coordinate 
type in the TilingBoundary table in the CAS). Each of these tiling rectangles lies completely within a single targeting chunk 
so that the target selection version is unique throughout the rectangle. 

Although tiling rectangles of the same tiling chunk never overlap, tiling rectangles from different tiling chunks can overlap; 
for example, the upper-left blue rectangle and the middle main green rectangle in Fig. Q3] On the other hand, a tiling rectangle 
never straddles two targeting chunks, so the target selection version is the same over the rectangle. A tiling chunk as a whole can 
straddle more than one targeting chunk, and can have tiling rectangles that don't all use the same version of the target selection 
pipeline. A set of spectroscopic tiles of radius 1°.49 are placed in each tiling chunk, and fibers assigned to the targeted objects 
therein, following the algorithm of Blanton et al. (2003). Thus because the tiles often extend beyond the boundaries of the tiling 
chunk (see Fig. [TSb . they do not include any targets beyond the tiling chunks. The intersection of the tiling rectangles and the 
circular tiles defines sectors: each sector is covered by a unique set of tiles (see Figure 3 of Blanton et al. 2005), and is a spherical 
polygon as described by Hamilton & Tegmark (2004). The union of all the sectors defines the angular coverage of the SDSS. We 
say a sector is a "non-overlap sector" if it is covered by only one tile (the lighter colors in Fig. IT31> and is an "overlap sector" if 
it is covered by more than one tile (indicated with darker colors in the figure). 

The tiling chunk geometry information is taken from the TilingBoundary table (which, itself, is a view of the 
TilingGeometry table with all the tiling masks removed) in the DR5 CAS server. We reject those tiling rectangles with 
target version lower than v3_l_0. The spectroscopic tile (plate) information is taken from the maindr5spectro . par table 
from the DR5 website 20 , which only includes tiles in the main survey and contains information of which tiling chunk each tile be- 
longs to. We create the sectors by combining the two geometries using the spherical polygon description in Hamilton & Tegmark 
(2004). When computing the effective area of either all the non-overlap sectors or all the overlap sectors we use the balkanization 
procedure in A. Hamilton's product mangle 21 to reduce duplicate area. 

After rejecting those tiling rectangles which used this earlier version, our sample covers a solid angle of 4041 deg 2 , of which 
roughly 30% is in overlap sectors. Because quasars in the overlap regions are not subject to the restriction of not targeting 
pairs separated by less than 55", and because the tiling algorithm deliberately places the tile overlap in regions of higher target 
density, one concern is that the angular selection function needs to take into account a higher selection function in the overlap 
region. However, we found that the number density of quasar candidates (here looking at all redshifts, not just the high-redshift 
candidates), and the number density of spectroscopically confirmed quasars, were essentially identical in the overlap and non- 
overlap sectors. In contrast, the number density of spectroscopic galaxies in the overlap sectors (93.1 deg" 2 ) is 23% higher than 
that in the non-overlap sectors (75.4 deg" 2 ), due to the deliberate placing of the overlaps in regions of high target density; galaxies 
dominate the SDSS spectroscopic targets, and beyond a subtle effect due to gravitational lensing (Scranton et al. 2005), we expect 
no correlation between the background quasars and the foreground galaxies. All this means that the angular selection function 
of our sample can be assumed to be uniform within the mask defined by the sectors that make up our sample. For DR5, the 
overall spectroscopic completeness of quasar candidates is ~ 95%, and the fraction of quasar candidates that are indeed quasars 
is ~ 48%. The angular quasar number density is ^9.4 deg" 2 . 

C. RELATIONSHIP BETWEEN HALO MASS, CLUSTERING STRENGTH, AND QUASAR LIFETIME 

In this appendix we follow Martini & Weinberg (2001), and provide some essential formulae to compute the quasar lifetime ?q 
and duty cycle using the measured correlation length and quasar number density. 

The Martin- Weinberg model is very sensitive to the halo number density at the high mass end, hence a more suitable fitting 
function is needed. The Press & Schechter (1974; PS) halo number density as a function of halo mass M and redshift is given by: 



where po = 2.78 x IO'^m h 2 Mq Mpc 3 is the mean density of the universe at z = 0; a(M) is the current (z = 0) rms linear 

19 We used the TARGET (not BEST) version of the Chunk table. 

20 http : / / www . sdss . org/dr5/| 

21 http://casa. Colorado, edu/^a jsh /mangle 
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Fig. 15. — Portion of the targeting and tiling geometry in SDSS spectroscopy. The targeting chunks are denoted by stripes bounded by black 
lines and each targeting chunk is targeted using one target version. Gray stripes are targeting chunks with target version no lower than v3_l_0 
(not necessarily the same version); one dark gray targeting chunk shown here is targeted with target version v2_13_5. Within targeting chunks 
we carve out tiling rectangles, each of which is targeted with a unique version. A set of tiling rectangles form a tiling chunk. Shown here as 
examples are tiling chunk 38, which has one rectangle (red) targeted with version v2_13_5 and three rectangles (green) with version v3_l_0; 
tiling chunk 67, whose rectangles (blue) are all with target version v3_l_0 or later. Within each tiling chunk we place tiles (1°.49 radius circles, 
which appear as ellipses because the aspect ratio of the region of sky shown is not 1 : 1); tiles are trimmed by the boundaries of rectangles of 
that tiling chunk and balkanized (i.e., Hamilton & Tegmark 2004) into non-overlap sectors (which are covered by only one tile) and overlap 
sectors (which are covered by more than one tile). We use light and dark colors to denote the two types of sectors in the above two tiling chunks. 
Note that though balkanized sectors of the same tiling chunk do not intersect with each other, they could intersect with sectors of another tiling 
chunk. In the above case, the upper-left corner rectangle in tiling chunk 67 is completely within the middle main rectangle of tiling chunk 
38. Therefore one should be careful when computing the effective area of sectors. In constructing our clean subsample for clustering analysis, 
we reject those sectors that are within tiling rectangles which are targeted with target version lower than v3_l_0, i.e., regions such as the red 
rectangle in chunk 38. 



density fluctuation smoothed by a spherical top-hat with radius r = i^^)^ 3 , normalized by ag; and S c (z) = 5 c fi/D(z) is the 
threshold density for collapse of a homogeneous spherical perturbation at redshift z, with D(z) the growth factor and 6 Ct o the 
critical threshold at z = 0, given in Appendix A of Navarro, Frenk, & White (1997). The Sheth-Tormen (ST) halo mass function 
is (Sheth & Tormen 1999) 



n(M,z)dM = -A 



2a po S c (z) da(M) 
VMa 2 (M) dM 



1 + 



a (M) 



aS 2 (z) 



exp 



2a 2 (M) 



dM 



(C2) 



where A = 0.3222, a = 0.707 and p = 0.3. We compared the ST and PS formalism using thez = 3 and z = 4 outputs of a cosmological 
TY-body simulation generated from the TPM code of Paul Bode and Jeremiah P. Ostriker (Bode, Ostriker, & Xu 2000; Bode & 
Ostriker 2003) which assumed the WMAP3 cosmology (Q m = 0.26, ft A = 0.74, H = 72kms _I Mpc" 1 , spectral index n s = 0.95, 
and us = 0.77. The simulation included ~ 10 9 particles in a box 1000 comoving h~ l Mpc on a side; the mass per particle was 
6.72 x 10 10 Ii~ 1 Mq. Dark matter halos were identified with the Friends-of-Friends algorithm using a linking parameter one fifth 
of the mean interparticle separation of the simulation. We found that the mass function in the simulations forM > 2 x IO^/t'Mq 
followed the ST predictions closely, while the PS form increasingly underpredicted the simulations at large masses, in agreement 
with a number of other authors (e.g., Sheth & Tormen 1999; Jenkins et al. 2001; Heitmann et al. 2006). Therefore we use the ST 
formula for the halo mass function throughout the paper. 
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The rms density fluctuation at z = 0, a(M), is given by 

cr(M) = 



1 

2^2 



dkk 2 P(k)W 2 (kr) 



1/2 



(C3) 



where W = 3(sin£r - kr cos kr)/(kr) 3 is the filter function for a spherical top-hat. The CDM power spectrum P(k) oc k"'T 2 (k) 
where n s is the primeval inflationary power spectrum index and T(k) is the transfer function, given by (Bardeen et al. 1986): 

ln(l + 2.34^) r 



T(k)-. 



2.34q 



-[l + 3.89q + (16.1q) 2 +(5.46q) 3 + (6.7lq) 4 ] 



,4,-1/4 



(C4) 



where q = k/T and T is the CDM shape parameter (with units of h Mpc ), given approximately by V = fij|f/zexp[— Qb - 
(2h) l / 2 ft b /ft M ] (Su giyama 1995). Using this CDM power spectrum we numerically integrate equation ( IC3b to obtain a{M) 
and da(M) /dM. The rms fluctuation at redshift z is thus given by 



a(M,z) = (j(M)D(z) , 



(C5) 



from which we can define a characteristic mass scale M*, such that cr[M»(z)] = S c (z). 

The halo lifetime is defined to be the median interval before a halo with initial mass M becomes a halo with mass Mo = 2M via 
mergers. This condition is given in Lacey & Cole (1993), 
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- exp 
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SiU>2 — S 2 LUl 



V2SiS 2 (Sl-S2) 



= 0.5 



(C6) 



where S\ = a 2 (M), S2 = a 2 (2M), u>\ = 8 c (z) and u>2 = S c (z2)- Hence the halo lifetime is given by ?h(M,z) = tu(z2)-tu(z), where 
fu(z) the age of the universe at redshift z, and zi is solved numerically from eqn. ( IC6b . For comparison, the age of the universe at 
z = 3.5 is ~ 2 Gyr. 

Halos with mass > M* are more strongly clustered than the underlying distribution of mass; the bias factor b(M,z) of halos 
with mass M at redshift z is given by (Jing 1998) 



b(M,z)={ 1 + 
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o c ,o 



r & 



S 2 c (z) 



a 2 (M) 



1 



r cr 4 (M) 
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(C7) 



where n e e = -3 -6(dlna/dlnM) is the effective index of the power spectrum on a mass scale M. The effective bias factor for all 
halos with mass above the minimal mass M m ; n is therefore 



&eff(A7min,Z) : 



dM 

M . t H (M,z) 



dM 



n{M,z) 



1 -1 



tu(M,z) 



(C8) 



Sinc e n(M,z) drops rapidly with increasing mass, fc e ff is only slightly larger than /?(M m ; n ,z). We have tested equations ( lC71 i and 
( IC8t with the simulations described above. We find that they correctly predict the bias inferred from the integrated correlation 
function £20- In particular, at the two output redshifts of the simulations, z = 3 and z = 4, the simulation results give a bias factor 
(calculated from the ratio of £20 for the halos and for the dark matter) of 6.2 at z = 3 and 10.2 at z = 4, for halos with mass 
> 2 x 10 12 h~ l Mq\ while the analytical bias formalism gives b e ff = 7.3 and 10.7 respectively. This difference is negligible when 
we integrate over a wide redshift range (equation IC1 II ) and compared with other uncertainties. On the other hand, there is clear 
evidence for a scale-dependent bias, which we plan to explore further in future work. 
The model predicted quasar correlation function ^modeiC^z) is therefore 



fmodei(>,z) = b 2 eB £ m (r,z) = b 2 ef[ £ m (r)D 2 (z) , 
where D(z) is the linear growth factor of fluctuations, and £ m (r) is the present-day mass correlation function, defined as 



dkk 2 P(k)- 



sinAr 



kr 



(C9) 



(C10) 



normalized using erg. Comparison of £,„(r,z) with the mass correlation function from the cosmological N-body simulation men- 
tioned above at z = 3 and z = 4 shows quite good agreement. 
The correlation function we have actually measured is averaged over a certain redshift range, hence 



jdV c nQ SO (z)Cmodel('-,z) 

JdV c n 2 QS0 (z) 



(Cll) 



where «qso(z) = $(z)/(z) is the observed quasar number density, i.e., the actual quasar number density times the complicated 
selection function /(z); and dV c is the differential comoving volume element, given in Hogg (1999). «qso is computed using our 
full high-redshift clustering subsample; see Figure |2] Note that the above equation is only valid for scales r over which «qso 
is near constant and £ does not significantly evolve over the time r/[(l+z)c] (PMN04). For our selected range [/n^rmax] = 
[5,20]/i _I Mpc, these conditions are satisfied. 



